------------------------------------------------------------------------------------------------------------------------------------------
      name:  <unnamed>
       log:  E:\REStat_MS14767_Vol96(2)\Data preparation Compustat\9_match_cq.log
  log type:  text
 opened on:  18 Dec 2014, 16:39:03

. *-----------------------------------------------------------------
. 
. *****************************************
. * starting data2.dta to master_compustat_all_countries.dta
. * This file reconstructs the connection between match.dta and compustat.dta
. * so that the data from compustat dataset may later be merged with patent data.
. * It is equivalent to the patents_match.do file.
. * Match.dta contains 5000 entries.
. **************************************************
. 
. set mem 600m

. 
. use "clean_compustat_master.dta"

. 
. keep CUSIP ticker name SIC4 countryinc

. rename CUSIP cusip

. rename SIC4 sic4

. so cusip

. drop if cusip==cusip[_n-1]
(311931 observations deleted)

. 
. ****************************************
. * In order to find matches within match.dta, 9-digit cusip is changed to 6-digit cusip
. * and renamed. cusip9 is the original cusip.
. ************************************
. 
. gen cusip6 = substr(cusip, 1, length(cusip) - 3)

. rename cusip cusip9

. rename cusip6 cusip

. so cusip

. label var cusip "6-digit cusip from compustat"

. label var cusip9 "9-digit cusip from compustat" 

. 
. save "patent files\clean_compustat_master_red.dta", replace
file patent files\clean_compustat_master_red.dta saved

. clear

. 
. ***************************
. * match.dta has cusip as identifier, contains 5,000 entries compared to 
. * 23,000 entries in compustat
. ******************************
. 
. use "patent files\patents\match.dta"

. so cusip

. 
. merge cusip using "patent files\clean_compustat_master_red.dta"
variable cusip does not uniquely identify observations in the master data
variable cusip does not uniquely identify observations in patent files\clean_compustat_master_red.dta

. 
. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      1,680        6.15        6.15
          2 |     22,413       82.03       88.18
          3 |      3,231       11.82      100.00
------------+-----------------------------------
      Total |     27,324      100.00

. 
. ********************************
. *   _merge |      Freq.     Percent        Cum.
. *------------+-----------------------------------
. *          1 |      1,680        6.15        6.15
. *          2 |     22,413       82.03       88.18
. *          3 |      3,231       11.82      100.00
. *------------+-----------------------------------
. *      Total |     27,324      100.00
. **************************************
. 
. drop if _merge==2
(22413 observations deleted)

. gen cusip_match=0

. replace cusip_match=1 if _merge==3
(3231 real changes made)

. drop _merge

. 
. gen cusip_name=name
(1680 missing values generated)

. replace name=cname
name was str28 now str30
(1828 real changes made)

. rename cusip9 cusip_cusip9

. so name

. 
. label var assignee "assignee identifier"

. label var cname "name in compustat data set"

. label var cusip "unique identifier in compustat data set"

. label var pname "name of parent firm"

. label var sname "name of subsidary firm"

. label var assname "patent assignee name "

. label var cusip_cusip9 "9-digit cusip number"

. label var cusip_name "original compustat name"

. label var cusip_match "valid cusip connection between match and compustat"

. 
. desc

Contains data from patent files\patents\match.dta
  obs:         4,911                          
 vars:            14                          25 Oct 2006 17:29
 size:     1,301,415                          
------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------------------------
assignee        long    %12.0g                assignee identifier
assname         str59   %59s                  patent assignee name
cname           str30   %30s                  name in compustat data set
cusip           str6    %9s                   unique identifier in compustat data set
own             byte    %8.0g                 
pname           str33   %33s                  name of parent firm
sname           str50   %50s                  name of subsidary firm
ticker          str8    %9s                   Ticker Symbol
cusip_cusip9    str9    %9s                   9-digit cusip number
name            str30   %30s                  Company Name
countryinc      byte    %10.0g                Country of Incorporation
sic4            int     %8.0g                 primary sic code
cusip_match     float   %9.0g                 valid cusip connection between match and compustat
cusip_name      str28   %28s                  original compustat name
------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  name
     Note:  dataset has changed since last saved

. 
. save "patent files\cusipmerged.dta", replace
file patent files\cusipmerged.dta saved

. clear

. 
. *****************************
. * This was merging cusip, now comes merging names with match.dta (now cusipmerged.dta)
. *******************************
. 
. use "patent files\clean_compustat_master_red.dta"

. drop cusip

. so name

. save "patent files\clean_compustat_master_red.dta", replace
file patent files\clean_compustat_master_red.dta saved

. clear

. 
. use "patent files\cusipmerged.dta"

. merge name using "patent files\clean_compustat_master_red.dta"
variable name does not uniquely identify observations in the master data
variable name does not uniquely identify observations in patent files\clean_compustat_master_red.dta

. 
. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      1,908        6.95        6.95
          2 |     22,548       82.12       89.06
          3 |      3,003       10.94      100.00
------------+-----------------------------------
      Total |     27,459      100.00

. drop if _merge==2
(22548 observations deleted)

. gen name_match=0

. replace name_match=1 if _merge==3
(3003 real changes made)

. drop if _merge==2
(0 observations deleted)

. drop _merge

. rename cusip9 name_cusip9

. count if cusip_match==0 & name_match==0
 1589

. 
. label var name_match "valid name connection between match and compustat"

. 
. save "patent files\cusipnamemerged.dta", replace
file patent files\cusipnamemerged.dta saved

. 
. count if name_match==1
 3003

. count if cusip_match==1
 3231

. 
. ****************************
. * Now the dataset contains 3003 name-matches and 3231 cusip-matches, 1589 non-matched***
. **********************************
. 
. insheet using "patent files\manual_matching_cq_ip.csv", clear
(5 vars, 174 obs)

. save "patent files\manual_matching_cq_ip.dta", replace
file patent files\manual_matching_cq_ip.dta saved

. drop in 1/1
(1 observation deleted)

. rename v1 manual_name

. rename v2 manual_cusip9

. rename v3 cname

. rename v4 cusip

. rename v5 error

. drop cusip manual_cusip

. so manual_name

. save "patent files\manual_matching_cq.dta", replace
file patent files\manual_matching_cq.dta saved

. 
. use "patent files\clean_compustat_master_red.dta", clear

. keep name cusip

. rename name manual_name

. rename cusip manual_cusip9

. so manual_name

. merge manual_name using "patent files\manual_matching_cq.dta"
variable manual_name does not uniquely identify observations in the master data
variable manual_name does not uniquely identify observations in patent files\manual_matching_cq.dta

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |     23,816       99.28       99.28
          2 |          1        0.00       99.28
          3 |        172        0.72      100.00
------------+-----------------------------------
      Total |     23,989      100.00

. keep if _merge==3
(23817 observations deleted)

. keep manual_name manual_cusip9 error cname

. so cname

. save "patent files\manual_matching_cq.dta", replace
file patent files\manual_matching_cq.dta saved

. 
. use "patent files\cusipnamemerged.dta", clear

. so cname

. merge cname using "patent files\manual_matching_cq.dta"
variable cname does not uniquely identify observations in the master data
variable cname does not uniquely identify observations in patent files\manual_matching_cq.dta

. gen manual=0

. replace manual=1 if _merge==3 & manual_name~="n"
(360 real changes made)

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      4,551       92.67       92.67
          3 |        360        7.33      100.00
------------+-----------------------------------
      Total |      4,911      100.00

. drop _merge

. 
. replace cname=manual_name if manual==1
(360 real changes made)

. drop manual_name

. replace cusip=manual_cusip9 if manual==1
cusip was str6 now str9
(360 real changes made)

. drop manual_cusip

. replace cname=cusip_name if cusip_match==1 & name_match==0
(319 real changes made)

. drop cusip_name

. replace cusip=name_cusip9 if cusip_match==0 & name_match==1
(91 real changes made)

. drop name_cusip9

. replace cusip=cusip_cusip9 if cusip_match==1
(3231 real changes made)

. drop cusip_cusip9

. 
. **********************************
. * The following steps prepare and examine the merge between compustat and cusipnamemerged
. * in order to append assignee numbers and names to compustat for later merge with patents
. **********************************
. 
. so assignee

. drop if assignee==assignee[_n-1]
(14 observations deleted)

. drop if cusip_match==0 & name_match==0 & manual==0
(1243 observations deleted)

. 
. save "patent files\cusipnamemerged.dta", replace
file patent files\cusipnamemerged.dta saved

. 
. use "patent files\clean_compustat_master_red.dta", clear

. rename cusip9 cusip

. so cusip

. save "patent files\clean_compustat_master_red.dta", replace
file patent files\clean_compustat_master_red.dta saved

. 
. use "patent files\cusipnamemerged.dta", clear

. so cusip

. merge cusip using "patent files\clean_compustat_master_red.dta"
variable cusip does not uniquely identify observations in the master data

. keep if _merge==3
(22261 observations deleted)

. tab _merge

     _merge |      Freq.     Percent        Cum.
------------+-----------------------------------
          3 |      3,654      100.00      100.00
------------+-----------------------------------
      Total |      3,654      100.00

. drop _merge

. 
. so cusip

. count if cusip!=cusip[_n-1]
 1710

. 
. *************************************************************
. * Here it is checked for non-american entries, 
. * those are dropped.
. ******************************************************
. 
. count if countryinc!=0 & countryinc!=.
   64

. * 64
. 
. drop if countryinc!=0 & countryinc!=.
(64 observations deleted)

. 
. save "patent files\cusipnamemerged.dta", replace
file patent files\cusipnamemerged.dta saved

. 
. keep assignee cusip ticker sic4

. so assignee

. desc

Contains data from patent files\cusipnamemerged.dta
  obs:         3,590                          
 vars:             4                          18 Dec 2014 16:39
 size:        82,570                          
------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------------------------
assignee        long    %12.0g                assignee identifier
cusip           str9    %9s                   unique identifier in compustat data set
ticker          str8    %9s                   Ticker Symbol
sic4            int     %8.0g                 primary sic code
------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  assignee
     Note:  dataset has changed since last saved

. save "patent files\cusipnamemerged_red.dta", replace
file patent files\cusipnamemerged_red.dta saved

. 
. desc

Contains data from patent files\cusipnamemerged_red.dta
  obs:         3,590                          
 vars:             4                          18 Dec 2014 16:39
 size:        82,570                          
------------------------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------------------------
assignee        long    %12.0g                assignee identifier
cusip           str9    %9s                   unique identifier in compustat data set
ticker          str8    %9s                   Ticker Symbol
sic4            int     %8.0g                 primary sic code
------------------------------------------------------------------------------------------------------------------------------------------
Sorted by:  assignee

. log close
      name:  <unnamed>
       log:  E:\REStat_MS14767_Vol96(2)\Data preparation Compustat\9_match_cq.log
  log type:  text
 closed on:  18 Dec 2014, 16:39:09
------------------------------------------------------------------------------------------------------------------------------------------
